Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes

نویسندگان

  • Ling-Hui Chen
  • Zhen-Hua Ling
  • Li-Rong Dai
چکیده

This paper presents a deep neural network (DNN) based spectral envelope conversion method. A global DNN is employed to model the complex non-linear mapping relationship between the spectral envelopes of source and target speakers. The proposed DNN is generatively trained layer-by-layer by cascade of two restricted Boltzmann machines (RBMs) and a bidirectional associative memory (BAM), which are considered as generative models estimated using the contrastive divergence algorithm. Further, multiple spectral envelopes are adopted instead of dynamic features for better modeling using the DNN. The superiority of the proposed method is validated by the subjective experimental results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion

This paper introduces the methods we adopt to build our system for the evaluation event of Voice Conversion Challenge (VCC) 2016. We propose to use neural network-based approaches to convert both spectral and excitation features. First, the generatively trained deep neural network (GTDNN) is adopted for spectral envelope conversion after the spectral envelopes have been pre-processed by frequen...

متن کامل

Emotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform

An artificial neural network is one of the most important models for training features of voice conversion (VC) tasks. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as mel cepstral coefficients (MCC) which represent the spectrum features. However, a simple representation for fundamental frequency (F0) is not enough for neural networks to deal with an...

متن کامل

A Powerful Generative Model Using Random Weights for the Deep Image Representation

To what extent is the success of deep visualization due to the training? Could we do deep visualization using untrained, random weight networks? To address this issue, we explore new and powerful generative models for three popular deep visualization tasks using untrained, random weight convolutional neural networks. First we invert representations in feature spaces and reconstruct images from ...

متن کامل

Transformation of spectral envelope for voice conversion based on radial basis function networks

This paper presents a novel algorithm that modifies the speech uttered by a source speaker to sound as if produced by a target speaker. In particular, we address the issue of transformation of the vocal tract characteristics from one speaker to another. The approach is based on estimating spectral envelopes using radial basis function (RBF) networks, which is one of the well-known models of art...

متن کامل

A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences

We extend our recently proposed approach to cross-lingual TTS training to voice conversion, without using parallel training sentences. It employs Speaker Independent, Deep Neural Net (SIDNN) ASR to equalize the difference between source and target speakers and Kullback-Leibler Divergence (KLD) to convert spectral parameters probabilistically in the phonetic space via ASR senone posterior probab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014